Automatically Annotating the ODP Web Taxonomy

نویسندگان

  • Christiana Christophi
  • Demetrios Zeinalipour-Yazti
  • Marios D. Dikaiakos
  • Georgios Paliouras
چکیده

In this paper we present the ideas and algorithms developed around our KeyGen Web Taxonomy Annotation engine. KeyGen annotates the Open Directory Project, also known as Dmoz, with meaningful and previously unknown keywords by utilizing domain knowledge extracted from the WWW. We present two algorithms: i) The PageParse Algorithm, which efficiently extracts keywords from Web Taxonomies using a combination of local and global scores, and ii) the Support Algorithm, an I/O optimized algorithm for coalescing hierarchies of keywords. We then present the results: i) from constructing a richly annotated ODP Web taxonomy and ii) from evaluating the correctness of this structure by performing an automated classification of Web-pages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Open Directory Project based universal taxonomy for Personalization of Online (Re)sources

Content personalization reflects the ability of content classification into (predefined) thematic units or information domains. Content nodes in a single thematic unit are related to a greater or lesser extent. An existing connection between two available content nodes assumes that the user will be interested in both resources (but not necessarily to the same extent). Such a connection (and its...

متن کامل

Utilizing global and path information with language modelling for hierarchical text classification

Hierarchical text classification of a Web taxonomy is challenging because it is a very large-scale problem with hundreds of thousand categories and associated documents. Furthermore, the conceptual levels and training data availabilities of categories vary widely. The narrow-down approach is the state-of-the-art that utilizes a search engine for generating candidates from the taxonomy and build...

متن کامل

Automatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia

Due to the explosive growth of web technology, a huge amount of information is available as web resources over the Internet. Therefore, in order to access the relevant content from the web resources effectively, considerable attention is paid on the semantic web for efficient knowledge sharing and interoperability. Topic ontology is a hierarchy of a set of topics that are interconnected using s...

متن کامل

Automatic Topic Ontology Construction Using Semantic Relations from WordNet and Wikipedia

Due to the explosive growth of web technology, a huge amount of information is available as web resources over the Internet. Therefore, in order to access the relevant content from the web resources effectively, considerable attention is paid on the semantic web for efficient knowledge sharing and interoperability. Topic ontology is a hierarchy of a set of topics that are interconnected using s...

متن کامل

Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns

BACKGROUND It is time-consuming to build an ontology with many terms and axioms. Thus it is desired to automate the process of ontology development. Ontology Design Patterns (ODPs) provide a reusable solution to solve a recurrent modeling problem in the context of ontology engineering. Because ontology terms often follow specific ODPs, the Ontology for Biomedical Investigations (OBI) developers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007